NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Large Model’s Ability to Identify 3D Objects as a Function of Viewing Angle

https://doi.org/10.1109/AIxVR59861.2024.00006

Rubinstein, Jacob; Ferraro, Francis; Matuszek, Cynthia; Engel, Don (January 2024, Proceedings of the IEEE Artificial Intelligence x Virtual Reality (AIxVR) Conference)

Virtual reality is progressively more widely used to support embodied AI agents, such as robots, which frequently engage in ‘sim-to-real’ based learning approaches. At the same time, tools such as large vision-and-language models offer new capabilities that tie into a wide variety of tasks and capabilities. In order to understand how such agents can learn from simulated environments, we explore a language model’s ability to recover the type of object represented by a photorealistic 3D model as a function of the 3D perspective from which the model is viewed. We used photogrammetry to create 3D models of commonplace objects and rendered 2D images of these models from an fixed set of 420 virtual camera perspectives. A well-studied image and language model (CLIP) was used to generate text (i.e., prompts) corresponding to these images. Using multiple instances of various object classes, we studied which camera perspectives were most likely to return accurate text categorizations for each class of object.
more » « less
Full Text Available
A Collaborative Building Task in VR vs. Reality

Higgins, Padraig; Barron, Ryan; Lukin, Stephanie; Engel, Don; Matuszek, Cynthia (November 2023, Proceedings of the International Symposium on Experimental Robotics)

Human-robot interaction is a critical area of research, providing support for collaborative tasks where a human instructs a robot to interact with and manipulate objects in an environment. However, an under-explored element of these collaborative manipulation tasks are small-scale building exercises, in which the human and robot are working together in close proximity with the same set of objects. Under these conditions, it is essential to ensure the human’s safety and mitigate comfort risks during the interaction. As there is danger in exposing humans to untested robots, a safe and controlled environment is required. Simulation and virtual reality (VR) for HRI have shown themselves to be suitable tools for creating space for human-robot experimentation that can be beneficial in these scenarios. However, the use of simulation and VR comes with the possibility of failures resulting from the sim-to-real gap, where the behavior of the simulated robot may not accurately reflect the experience of a human collaborator in a real-world setting. This gap can limit the generalizability of research findings and raise questions about the validity of using simulation and VR for HRI research. Our goal in this work is to demonstrate the effectiveness of sim-to-real approaches for contact-based human-robot interaction.
more » « less
Full Text Available
Photogrammetry and VR for Comparing 2D and Immersive Linguistic Data Collection (Student Abstract)

https://doi.org/10.1609/aaai.v37i13.27016

Rubinstein, Jacob; Matuszek, Cynthia; Engel, Don (September 2023, Proceedings of the AAAI Conference on Artificial Intelligence)

The overarching goal of this work is to enable the collection of language describing a wide variety of objects viewed in virtual reality. We aim to create full 3D models from a small number of ‘keyframe’ images of objects found in the publicly available Grounded Language Dataset (GoLD) using photogrammetry. We will then collect linguistic descriptions by placing our models in virtual reality and having volunteers describe them. To evaluate the impact of virtual reality immersion on linguistic descriptions of the objects, we intend to apply contrastive learning to perform grounded language learning, then compare the descriptions collected from images (in GoLD) versus our models.
more » « less
Full Text Available
Mobile augmented reality system for object detection, alert, and safety

https://doi.org/10.2352/EI.2023.35.12.ERVR-218

Sharma, Sharad; Engel, Don (January 2023, Electronic Imaging)

Full Text Available
Lessons From A Small-Scale Robot Joining Experiment in VR

Higgins, Padraig; Barron, Ryan; Engel, Don; Matuszek, Cynthia (March 2023, The Human-Robot Interaction Conference (HRI) 6th Int'l Workshop on Virtual, Augmented, and Mixed-Reality for Human-Robot Interactions (VAM-HRI))

In this paper, we present a shared manipulation task performed both in virtual reality with a simulated robot and in the real world with a physical robot. A collaborative assembly task where the human and robot work together to construct as simple electrical circuit was chosen. While there are platforms available for conducting human robot interactions using virtual reality, there has not been significant work investigating how it can influence human perception of tasks that are typically done in person. We present an overview of the simulation environment used, describe the paired experiment being performed, and finally enumerate a set of design desiderata to be considered when conducting sim2real experiment involving humans in a virtual setting.
more » « less
A Spoken Language Dataset of Descriptions for Speech-Based Grounded Language Learning

Kebe, Gaoussou Y.; Higgins, Padraig; Jenkins, Patrick; Darvish, Kasra; Sachdeva, Rishabh; Barron, Ryan; Winder, John; Engel, Don; Raff, Edward; Ferraro, Francis; et al (December 2021, Advances in neural information processing systems)

Grounded language acquisition is a major area of research combining aspects of natural language processing, computer vision, and signal processing, compounded by domain issues requiring sample efficiency and other deployment constraints. In this work, we present a multimodal dataset of RGB+depth objects with spoken as well as textual descriptions. We analyze the differences between the two types of descriptive language and our experiments demonstrate that the different modalities affect learning. This will enable researchers studying the intersection of robotics, NLP, and HCI to better investigate how the multiple modalities of image, depth, text, speech, and transcription interact, as well as how differences in the vernacular of these modalities impact results.
more » « less
Full Text Available
A Spoken Language Dataset of Descriptions for Speech-Based Grounded Language Learning

Kébé, Gaoussou Youssouf; Higgins, Padraig; Jenkins, Patrick; Darvish, Kasra; Sachdeva, Rishabh; Barron, Ryan; Winder, John; Engel, Don; Raff, Edward; Ferraro, Francis; et al (December 2021, Advances in neural information processing systems)

Grounded language acquisition is a major area of research combining aspects of natural language processing, computer vision, and signal processing, compounded by domain issues requiring sample efficiency and other deployment constraints. In this work, we present a multimodal dataset of RGB+depth objects with spoken as well as textual descriptions. We analyze the differences between the two types of descriptive language and our experiments demonstrate that the different modalities affect learning. This will enable researchers studying the intersection of robotics, NLP, and HCI to better investigate how the multiple modalities of image, depth, text, speech, and transcription interact, as well as how differences in the vernacular of these modalities impact results.
more » « less
Full Text Available
Towards Making Virtual Human-Robot Interaction a Reality

Higgins, Padraig; Kebe, Gaoussou Youssouf; Berlier, Adam; Darvish, Kasra; Engel, Don; Ferraro, Francis; Matuszek, Cynthia (March 2021, Proc. of the 3rd International Workshop on Virtual, Augmented, and Mixed-Reality for Human-Robot Interactions (VAM-HRI))

For robots deployed in human-centric spaces, natural language promises an intuitive, natural interface. However, obtaining appropriate training data for grounded language in a variety of settings is a significant barrier. In this work, we describe using human-robot interactions in virtual reality to train a robot, combining fully simulated sensing and actuation with human interaction. We present the architecture of our simulator and our grounded language learning approach, then describe our intended initial experiments.
more » « less
Full Text Available

Search for: All records